Large-Scale Content-Based Matching of MIDI and Audio Files
نویسندگان
چکیده
MIDI files, when paired with corresponding audio recordings, can be used as ground truth for many music information retrieval tasks. We present a system which can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata. The core of our approach is a convolutional network-based cross-modality hashing scheme which transforms feature matrices into sequences of vectors in a common Hamming space. Once represented in this way, we can efficiently perform large-scale dynamic time warping searches to match MIDI data to audio recordings. We evaluate our approach on the task of matching a huge corpus of MIDI files to the Million Song Dataset. 1. TRAINING DATA FOR MIR Central to the task of content-based Music Information Retrieval (MIR) is the curation of ground-truth data for tasks of interest (e.g. timestamped chord labels for automatic chord estimation, beat positions for beat tracking, prominent melody time series for melody extraction, etc.). The quantity and quality of this ground-truth is often instrumental in the success of MIR systems which utilize it as training data. Creating appropriate labels for a recording of a given song by hand typically requires person-hours on the order of the duration of the data, and so training data availability is a frequent bottleneck in content-based MIR tasks. MIDI files that are time-aligned to matching audio can provide ground-truth information [8,25] and can be utilized in score-informed source separation systems [9, 10]. A MIDI file can serve as a timed sequence of note annotations (a “piano roll”). It is much easier to estimate information such as beat locations, chord labels, or predominant melody from these representations than from an audio signal. A number of tools have been developed for inferring this kind of information from MIDI files [6, 7, 17, 19]. Halevy et al. [11] argue that some of the biggest successes in machine learning came about because “...a large training set of the input-output behavior that we seek to automate is available to us in the wild.” The motivation behind c Colin Raffel, Daniel P. W. Ellis. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Colin Raffel, Daniel P. W. Ellis. “LargeScale Content-Based Matching of MIDI and Audio Files”, 16th International Society for Music Information Retrieval Conference, 2015. J/Jerseygi.mid
منابع مشابه
Polyphonic Audio Matching for Score Following and Intelligent Audio Editors
Getting computers to understand and process audio recordings in terms of their musical content is a difficult challenge. We describe a method in which general, polyphonic audio recordings of music can be aligned to symbolic score information in standard MIDI files. Because of the difficulties of polyphonic transcription, we perform matching directly on acoustic features that we extract from MID...
متن کاملOn the Computer Recognition of Solo Piano Music
We present work towards a computer system for the automatic transcription of piano performances. The system takes audio files containing polyphonic piano music as input, and produces MIDI output, representing the pitch, timing and volume of the musical notes. The aim of this work is not to reduce the performance data to common music notation, but to extract the performance parameters for a quan...
متن کاملMelody Matching Directly From Audio
In this paper we explore a technique for content-based music retrieval using a continuous pitch contour derived from a recording of the audio query instead of a quantization of the query into discrete notes. Our system determines the pitch for each unit of time in the query and then uses a time-warping algorithm to match this string of pitches against songs in a database of MIDI files. This tec...
متن کاملExtracting Ground-Truth Information from MIDI Files: A MIDIfesto
MIDI files abound and provide a bounty of information for music informatics. We enumerate the types of information available in MIDI files and describe the steps necessary for utilizing them. We also quantify the reliability of this data by comparing it to human-annotated ground truth. The results suggest that developing better methods to leverage information present in MIDI files will facilita...
متن کاملPolyphonic Audio Matching and Alignment for Music Retrieval
We describe a method that aligns polyphonic audio recordings of music to symbolic score information in standard MIDI files without the difficult process of polyphonic transcription. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording.
متن کامل